Cognitive Modeling with Context Sensitive Reinforcement Learning
نویسندگان
چکیده
A reinforcement learning system is typically described as a black box which receives two types of input, the current state, S, and the current reinforcement, R. From these two inputs, the system has to figure out a policy that determines what action to perform in each state to maximize the received reinforcement in the future (Sutton & Barto, 1998). The future expected reinforcement can be estimated either by using the sum of all future reinforcement or with an exponentially decaying time horizon. It is also possible to only take into account the reinforcement received at the next goal action which results in finite horizon algorithms (e. g. Balkenius & Morén, 1999). Learning is viewed as the formation of assocations between states and actions and are represented by numerical values that are changed during learning. In most basic reinforcement learning algorithm, the policy for each state is learned individually without regard for the similarity between different states. It would obviously be valuable if actions learned in one state could be generalized to other similar states. Such generalization can be introduced into a reinforcement learning algorithm in several ways. One possibility is to code the similarity between states by similar state vectors. Such methods have been proposed by Sutton (1996), who used a tile representation or the underlying state space and Balkenius (1996), who used a multi-resolution representation. As alternative is to learn the underlying state representation during exploration based on the closeness of different states (Dayan, 1993). In both cases, learning becomes faster since each learning instance will be generalized to many similar states. In many cases, it makes sense to divide the state input into two parts, one that code for the situation or context and one that codes for the part of the state that controls the action (cf. Balkenius & Hulth, 1999, Houghes & Drogoul, 2001). If such a combined representation is used together with the reinforcement algorithms described above, learning will generalize not only to similar states but also to similar contexts. The role of state and context will thus be symmetric.
منابع مشابه
Translating a Reinforcement Learning Task into a Computational Psychiatry Assay: Challenges and Strategies
Computational psychiatry applies advances from computational neuroscience to psychiatric disorders. A core aim is to develop tasks and modeling approaches that can advance clinical science. Special interest has centered on reinforcement learning (RL) tasks and models. However, laboratory tasks in general often have psychometric weaknesses and RL tasks pose special challenges. These challenges m...
متن کاملWhen, What, and How Much to Reward in Reinforcement Learning-Based Models of Cognition
Reinforcement learning approaches to cognitive modeling represent task acquisition as learning to choose the sequence of steps that accomplishes the task while maximizing a reward. However, an apparently unrecognized problem for modelers is choosing when, what, and how much to reward; that is, when (the moment: end of trial, subtask, or some other interval of task performance), what (the object...
متن کاملCognitive Modeling of Action Selection Learning
Our goal is to develop a hybrid cognitive model of how humans acquire skills on complex cognitive tasks. We are pursuing this goal by designing hybrid computational architectures for the NRL Navigation task, which requires competent sensorimotor coordination. In this paper, we describe results of directly tting human execution data on this task. We next present and then empirically compare two ...
متن کاملSocial stress reactivity alters reward and punishment learning.
To examine how stress affects cognitive functioning, individual differences in trait vulnerability (punishment sensitivity) and state reactivity (negative affect) to social evaluative threat were examined during concurrent reinforcement learning. Lower trait-level punishment sensitivity predicted better reward learning and poorer punishment learning; the opposite pattern was found in more punis...
متن کاملCognitive flexibility in adolescence: Neural and behavioral mechanisms of reward prediction error processing in adaptive decision making during development
Adolescence is associated with quickly changing environmental demands which require excellent adaptive skills and high cognitive flexibility. Feedback-guided adaptive learning and cognitive flexibility are driven by reward prediction error (RPE) signals, which indicate the accuracy of expectations and can be estimated using computational models. Despite the importance of cognitive flexibility d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004